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ILLIAC  II  is  a  large  high-speed,  general-purpose  computer  built  by 
the  Digital  Computer  Laboratory,   University  of  Illinois,  Urbana.   Comprehensive 
plans  for  its  construction  were  given  in  a  widely  quoted  1957  report  [38]=   No 
similarly  comprehensive  post-construction  report  exists,  although  a  number  of 
papers  describing  various  aspects  of  the  computer  have  been  published.   This 
bibliography  lists  these  papers  and  provides  a  short  description  of  the  computer 
as  a  guide  to  the  entries. 

The  papers  cited  fall  into  two  classes.   The  first  class  is  the  open 
literature  consisting  of  journal  articles,  symposia  proceedings,  and  the  like. 
The  second  class  is  Digital  Computer  Laboratory  Reports.   These  are  included 
because  they  are  fairly  widely  held  in  the  libraries  of  computer  organizations 
and  they  have  been  cited  in  the  literature.   The  internal  Digital  Computer 
Laboratory  documents  related  to  construction  are  not  cited  here. 

History 

Planning  for  ILLIAC  II  began  June  1,  1956,  and  culminated  in  1957  in 
a  report  describing  the  proposed  design  [38]-   Design  began  in  1957  and  final 
chassis  construction  began  in  i960.   In  I962,  the  two  controls,  arithmetic  unit 
and  core  memory  began  operation  with  paper  tape  input  and  output.   At  the  present 
time  the  machine  is  essentially  complete  and  in  use,  and  work  continues  on  the 
addition  of  input-output  devices  and  other  peripheral  equipment.   The  work  has 
been  supported  jointly  by  the  University  of  Illinois,  the  Atomic  Energy 
Commission  and  the  Office  of  Naval  Research.   The  IBM  Corporation  donated  a 
number  of  input-output  devices. 


Recently  renamed  the  Department  of  Computer  Science 
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Organlzation 

ILLIAC  II  is  a  highly  parallel  computer,  with  three  simultaneously 
operating  controls.   Operations  of  the  floating-point  arithmetic  unit  are 
controlled  by  an  arithmetic  control „   Transfer  of  data  between  the  core  memory 
and  the  slower  memories  is  controlled  by  an  interplay  control.   Other  control 
functions  are  performed  by  a  supervisory  control  called  Advanced  Control.   Among 
the  functions  of  Advanced  Control  are  fetching  and  storing  of  operands,  address 
construction  and  indexing,  and  partial  decoding  of  the  orders  for  the  other  two 
controls.   The  Advanced  Control  order  code  is  rather  elaborate,  and  in  conjunc- 
tion with  the  13-bit  registers  in  the  fast  memory  it  provides  for  a  large  variety 
of  13-bit  fixed-point  arithmetic  and  logical  operations,  except  multiplication 
and  division. 

The  hierarchy  of  memories  consists  of: 

1.  Fast  transistor  memory,  10  words,  0.2  usee. 

2.  Core  memory,  8,192  words  (soon  to  be  12,288  words), 
1.8  usee. 

3.  Drum  memory  65,536  words,  8.5  msec  average  access  time, 
7.8  usee  word  period. 

k.      Magnetic  tapes  and  disk  files. 

The  order  code  contains  long  and  short  instructions.   A  13-bit  short 
instruction,  which  occupies  only  a  quarter  word,  contains  four  bits  to  specify 
an  index  register  containing  an  operand  or  an  address.   A  26-bit  long  instruc- 
tion contains  in  addition  a  13-bit  address.   Long  instructions  may  be  packed 
two  to  a  word.   Two  words  of  orders  are  held  in  the  fast  memory.   This  makes  is 
possible  to  execute  a  loop  of  up  to  eight  short  instructions  (two  words)  without 
any  instruction  fetches  from  the  core  memory.   If  at  the  same  time  the  operands 
are  held  in  the  ten-word  fast  memory,  a  very  fast  loop  can  be  written. 
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A  detailed  consideration  of  the  size  and  speed  requirements  of  the 
various  parts  of  the  machine  for  several  classes  of  problems  is  given  in  Taub, 
et  al.  [38],  which  also  contains  an  early  version  of  the  order  code.   Considera- 
tion of  problem  types  is  also  contained  in  Taub  [39] •   More  detailed  descriptions 
of  the  organization  and  the  order  code  are  contained  in  Gillies  [8],  [9],   [11].  Up 
to  date  details  are  given  in  the  ILLIAC  II  programmer's  manual  [5]. 

Arithmetic  Unit 

The  arithmetic  unit  is  asynchronous,  double-precision,  floating-point . 
It  is  radix  k   in  almost  all  respects.   Single-precision  operands  are  52  bits 
long,  with  a  ^5~bit  fraction  and  a  7-bit  exponent  (base  h)    in  radix  complement 
representation.   The  range  of  normalized  single-precision  numbers  in  the  memory 


is 


k~Gk  <   lxy|  <  k63 


Results  of  most  arithmetic  operations  are  not  normalized  and  the  programmer  is 
free  to  normalize  or  not  as  he  stores  them.   To  aid  in  fixed-point  programming, 
orders  are  provided  which  force  the  exponent  to  one  of  three  values,  with 
corresponding  shifts  in  the  fraction  part.   The  roundoff  which  occurs  when 
storing  a  double-precision  arithmetic  result  in  the  single-precision  memory  is 
obtained  by  adding  1  or  0  to  the  last  retained  fraction  bit  for  discarded 
fractions  greater  or  less  than  one-half  respectively.   The  equality  case  is 
made  dependent  on  the  (presumably  random)  last  retained  bit  to  produce  an 
unbiased  roundoff. 

A  number  of  features  are  provided  to  increase  the  speed  of  operation. 
Redundant  number  representations  and  separate  carry  storage  are  used  within 
part  of  the  arithmetic  unit  to  eliminate  carry  propagation  during  repeated 
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additions  such  as  occur  in  multiplication.   In  general  a  carry  bit  is  provided 
for  each  two  fraction  bits.   Multiplier  digits,  originally  having  values  0,  1, 
2,  3  are  recoded  to  the  range  -1,  0,  1,  2  and  two-at-a-time  shifts  are  provided. 
Two  adders  are  provided  so  that  addition  may  be  performed  both  while  gating 
from  the  accumulator  (A,  Q)  to  the  temporary  accumulator  (S,R)  and  vice  versa. 
Radix  k   division  was  considered  by  Robertson  [30],  but  rejected  in  favor  of 
redundant  binary  nonrestoring  division,  wherein  the  quotient  digits  are  generated 
as  -1,  0,  +1  and  then  recoded  as  base  h   digits  with  values  between  -3  and  +3. 
Carries  are  assimilated  before  a  store,  since  the  other  parts  of  the  computer 
do  not  use  redundant  number  representation. 

The  floating-point  arithmetic  unit  as  constructed  is  described 
theoretically  in  Robertson  [31]  and  in  detail  in  Penhollow  [19].   Earlier  plans 
were  described  in  Taub,  et  al.  [38]  and  Wheeler  [^+0],   In  addition  there  were  a 
number  of  earlier  studies.   These  included  redundant  number  representations  by 
Avizienis  [1],  [2],  [3]  and  Metze  [14],  use  of  redundant  number  representation 
in  the  whole  computer  instead  of  just  the  arithmetic  unit  by  Metze  and 
Robertson  [15],  separate  carry  storage  adders  by  Takahashi  [37],  efficient 
multiplier  and  division  recodings  by  Penhollow  [18],  and  efficient  division  by 
Robertson  [30],  Metze  [16]  and  Shively  [3^]. 

Speed  Independence  and  Control  Design 

Theories  of  asynchronous  circuits  and  speed  independence  were  studied 
extensively  prior  to  construction.   The  speed  independence  problem  is  stated 
physically  and  theoretically  in  Taub,  et  al.  [38].   Detailed  theoretical  studies 
are  in  Muller  and  Bartky  [17],  Shelly  [33],  and  Bartky  [k] .   A  circuit  is 
speed- independent  if  its  function  does  not.  depend  on  the  speeds  at  which  its 
constituent  parts  operate.   Advantages  of  speed  independence  are  increased 
reliability  and  ease  of  maintenance. 
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The  realization  of  spesd  independence  used  in  the  controls  of 
ILLIAC  II  involves  the  collection  of  reply  signals  to  insure  that  all  the 
operations  which  must  he  performed  at  each  step  are  complete  before  going  on 
to  the  next  step.   Some  of  the  problems  involved  in  designing  the  arithmetic 
control  in  this  way  are  described  in  Swartwout  [35]>  Robertson  [32],  and 
Gillies  [10].   Advanced  Control  was  designed  in  a  similar  but  not  identical  way. 
The  arithmetic  unit  was  made  not  speed-independent  to  avoid  increasing  its 
complexity  and  cost  and  decreasing  its  speed.   The  electro-mechanical  peripheral 
devices  are  inherently  synchronous,  but  the  philosophy  of  speed  independence  was 
partly  extended  to  them  by  the  provision  of  replies  and  alarms  for  many  of  the 
control  signals . 

A  theoretical  study  of  methods  of  designing  a  speed  independent 
control,  including  the  method  actually  used  for  the  arithmetic  control,  is 
contained  in  Swartwout  [36]. 

Speeds 

Some  approximate  operation  times  are  as  follows: 

Floating  add  or  subtract  2.5  to  3. 5  usee 

Floating  multiply  6„3  usee 

Floating  divide  16.0  usee 

Indexing  1.0  usee 

13-bit  integer  orders  2.0  usee 

Fast  memory  .2  usee 

Core  memory  1.8  usee 

The  times  shown  for  arithmetic  do  not  include  instruction  or  operand 
accessing  times  because  Advanced  Control  performs  memory  accesses  concurrently 
with  arithmetic,  usually  with  zero  net  time  charges.   Instruction  decoding, 
address  construction  and  indexing  are  similarly  overlapped  with  arithmetic, 
and  most  absorb  no  effective  time  at  all. 
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Fast  Memory 

Ten  words  of  very  fast  storage  are  provided,  called  the  fast  memory 
or  flow  gating  memory.   These  ten  registers  are  composed  of  transistor  flipflops 
with  common  input  and  output  louses  and  special  gating  arrangements  to  keep  the 
number  of  transistors  small.   The  design  achieves  high  speed  and  high  sensitivity 
along  with  the  usually  contradictory  high  stability  by  using  variable  feedback. 
During  the  write-in  operation,  a  gate  signal  lowers  the  average  potential  of  the 
flipflop.   This  produces  two  effects:   (l)  information  is  allowed  to  flow  into 
the  circuit  through  a  diode  from  the  input  bus,  and  (2)  the  feedback  in  the 
flipflop  is  disabled.   This  reduces  the  circuit  to  a  difference  amplifier,  and 
the  information  is  stored  in  the  base-emitter  capacitances.   At  the  end  of  the 
write-in  operation  the  average  potentials  are  raised  back  to  normal,  thus  cutting 
off  the  input  diode  and  allowing  the  feedback  to  permanently  store  the  infor- 
mation.  The  operation  time  is  0.2  usee.   The  transistor  counts  per  bit  are: 
basic  flipflop  2,  output  driver  1,  write  and  read  drivers  and  terminations 
about  2.3. 

The  fast  memory  sits  at  the  "crossroads"  of  the  computer,  and  some  of 
its  registers  are  also  intimately  identified  with  other  parts  of  the  machine, 
e.g.,  the  core  memory,  Advanced  Control,  the  arithmetic  control  and  the  arith- 
metic unit.   Four  of  the  fast  registers  are  also  addressable  as  quarter  words, 
thus  providing  l6  registers  of  13  bits  each  for  use  as  index  registers  and  for 
other  purposes. 

The  early  plans  for  the  fast  memory  were  given  in  Taub,  et  al.  [38], 
and  Poppelbaum  [20],  [21],  [22].   A  brief  mention  is  also  made  in  Poppelbaum  [24], 
Detailed  experimental  data  on  the  fast  memory,  including  tolerance  analyses, 
waveforms  and  other  details,  is  given  in  Guckel,  Kunihiro  and  Crow  [12].   A 
patent  covering  the  flow-gating  principle  was  issued  in  1962  [25]. 
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Core  Memory 

The  core  memory  was  originally  planned  to  contain  8,192  words  of  52 
bits  plus  parity  each.   There  were  to  be  two  4, 096-word  modules,  with  odd 
addresses  in  one  module  and  even  addresses  in  the  other  to  halve  the  average 
access  time  for  sequential  addresses.   The  first  4, 096-word  module  was  completed 
in  1962.   It  was  word  oriented,  with  one  switch  core  per  word  and  two  data  cores 
per  bit.   Two  data  cores  per  bit  give  bipolar  output  and  a  loading  on  the  switch 
cores  that  was  virtually  independent  of  the  digit  pattern.   Partial  switching 
was  used  to  increase  speed  and  reduce  core  heating.   Readout  was  destructive 
and  a  restoration  cycle  was  provided. 

Early  plans  for  the  core  memory  were  described  in  Taub,  et  al.  [38]. 
Some  earlier  experiments  were  reported  in  McKay,  Yu,  Pottle  [13].   Detailed  plans 
for  the  construction  of  the  first  4, 096-word  module  were  described  in  Ray  [27]. 
Theoretical  studies  of  partial  swtiching  are  contained  in  Ray  [28],  [29]. 

The  first  4,096-word  module  was  finished  in  1962,  and  has  been  in 
operation  since  then  (without  the  interleaved  addresses  feature)  at  a  cycle  time 
of  1.8  u-sec.   In  1964,  a  commercial  8,192 -word  core  memory  was  purchased.   The 
original  4,096-word  module  and  4,096  words  of  the  commercial  core  memory  are  now 
in  operation  with  interleaved  addresses.   This  exhausts  the  addressing  capa- 
bilities of  the  original  13-bit  address  field.   The  addressing  scheme  is  presently 
being  modified  to  allow  the  additional  4,096  words  also  to  be  used. 

Circuits 

The  basic  circuits  used  in  the  high-speed  portions  of  the  machine  are 
nonsaturating  current  switching  circuits  using  pnp  germanium  mesa  transistors. 
Switching  times  are  10  to  40  nanoseconds.   Early  reports  on  these  circuits  are 
Taub,  et  al.  [38]  and  Poppelbaum  and  Wiseman  [22].   The  actual  construction  was 


based  on  a  revised  design  completed  in  the  summer  of  i960.  A  patent  covering 
the  asymmetrical  flipflop  was  issued  in  i960  [23].  A  tutorial  description  of 
some  of  the  memory  elements  is  in  Rao  [26]. 

The  slower  parts  of  the  computer  (interplay,  Drum  Memory,  Input-Output 
Channels,  etc.)  contain  a  variety  of  slower  circuits.  These  include  saturating, 
nonsaturating,  current  switching  and  NOR  topologies  using  germanium  transistors. 

The  computer  contains  about  55,000  transistors  and  133^000  diodes, 
exclusive  of  the  commercially  built  input-output  devices. 

Input -Output  and  Interrupt 

Two  input -output  systems  are  provided- -a  high  capacity  full  word 
system  and  a  slower  quarter  word  system. 

Full  word  data  transfers  in  the  memory  hierarchy  are  between  the  core 
memory  and  one  of  the  other  memories  or  devices.   Transfers  between  the  core 
memory  and  the  ten-word  fast  memory  are  supervised  by  Advanced  Control.   All 
other  full  word  transfers  are  performed  by  Interplay,  which  contains  the  necessary 
controls  and  data  buffers.   Interplay  is  a  wired  program  computer  of  a  limited 
sort.   It  begins  a  data  transfer  between  the  core  memory  and  one  of  the  other 
memories  or  devices  in  response  to  a  command  from  Advanced  Control.   After  the 
initial  set-up,  Advanced  Control  and  Interplay  operate  independently  without 
interaction  except  that  they  compete  for  core  memory  accesses.   Each  of  the 
Interplay  Channels  can  be  performing  a  transfer  at  the  same  time.   Currently 
there  are  nine  channels  in  use  out  of  a  possible  32.   The  capacity  of  Interplay 
is  one  word  every  3-5  M-sec. 

The  slower  input-output  system,  called  the  special  register  system, 
allows  Advanced  Control  to  exchange  13-bit  characters  with  up  to  6k   input-output 
registers.   Each  13-bit  transfer  requires  Advanced  Control  to  execute  one  order 
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as  distinguished  from  Interplay  which  operates  in  parallel  with  Advanced  Control 
and  requires  execution  of  only  two  Advanced  Control  orders  to  transfer  a  block 
of  data,  generally  256  words.   The  special  register  system  is  used  for  low-speed 
input-output  and  to  transmit  control  and  status  information  for  peripheral 
devices. 

An  interrupt  system  is  connected  to  certain  hits  of  the  special 
registers.   For  example,  when  an  Interplay  channel  completes  the  transfer  of  a 
block  of  data,  a  completion  signal  is  provided  via  one  of  the  special  registers. 
This  may,  if  desired,  interrupt  the  program  then  running  and  call  a  supervisory 
program  to  initiate  another  transfer  or  take  other  action.   The  interrupt  system 
may  also  be  actuated  by  errors,  power  failures,  requests  from  consoles,  etc. 

Magnetic  Drum  Memory 

The  Magnetic  Drum  Memory  stores  65,536  words  on  two  3^00-rpm  drums. 
Each  word  is  stored  as  four  13-bit  characters  plus  parity.   The  character  period 
is  1.95  usee;  the  word  period  is  7.8  usee.   Non-return-to-zero  recording  is  used 
at  a  packing  density  of  288  bits  per  inch.   Full  52-bit  parallel  recording  with 
a  1.95-M-sec  word  period  was  considered  but  not  used  because  it  would  have 
required  four  times  as  many  read  and  write  amplifiers  and  it  would  have  almost 
completely  occupied  the  core  memory  while  a  drum  transfer  was  in  progress.   Drum 
data  is  written  and  read  in  256-word  blocks,  with  eight  blocks  per  band,  and 
l6  bands  per  drum.   Gaps  between  the  blocks  allow  for  head  switching  so  that 
following  any  block  transfer,  random  access  to  one  of  the  l6  blocks  in  the  next 
sector  may  be  obtained  without  waiting. 
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System  Programs 

The  ILLIAC  II  software  includes  an  assembler  called  NICAP,  a  FORTRAN  II 
translator,  and  an  operating  system  program.   Among  other  things,  NICAP  handles 
the  multiple-orders-per-word  problem  and  translates  complex  address  field 
expressions,  including  nested  parentheses  to  any  depth.   Parts  of  address  field 

expressions  which  can  be  evaluated  at  translation  time  are  so  evaluated.   The 

) 

remaining  additions  and  subtractions  are  prepared  for  execution  at  run  time  by 

the  13-bit  fixed-point  arithmetic  unit  in  Advanced  Control;  multiplications  and 
divisions  are  prepared  for  execution  by  the  floating-point  arithmetic  unit.   The 
address  field  compilation  algorithm  is  described  in  Gear  [6]. 

The  FORTRAN  II  translator  produces  assembly  language  in  a  single  pass. 
Effective  use  of  the  drum  memory  enables  the  translator  to  proceed  without  the 
use  of  magnetic  tapes,  thus  gaining  an  order  of  magnitude  in  speed.   The 
operating  system  program  provides  for  batch  processing.   The  various  system  and 
library  programs  are  described  in  a  user's  manual  [5]  and  in  a  compiler  writer's 
manual  [ 7 ] . 
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